Using the pulmonary model from the SECRET competition.

Load in design, data:

design <- readRDS('data/designW1.rds')

t <- 512 # number of timepoints
v <- 3 # number of variables
n <- 100 # number of simulations

all_data <- array(0, dim = c(t,v,n))
for (i in 1:n){
  tmp <- readMat(paste0('data/output/flow', i, '.mat'))[[1]]
  all_data[,,i] <- tmp
}

Plot all data:

plot_data <- data.frame(Time = 1:t,
                        Run = rep(1:n, each = t*v),
                        Output = c(all_data),
                        Type = rep(c('Flow1', 'Flow2', 'Pressure'), each = t))

plot_data2 <- melt(plot_data, id.vars = c('Time', 'Run', 'Type'))

ggplot(plot_data2, aes(x = Time, y = value, col = as.factor(Run))) +
  geom_line() +
  facet_wrap(vars(Type), nrow = 2, scales = 'free_y') +
  theme(legend.position = 'none')

Emulate single output

Let’s select timepoint 200 from Flow1 as our output. Creating dataframe, and plotting against 1 of the inputs:

design$ID <- NULL # don't need this column
tData <- data.frame(design[,1:4], y = all_data[200,1,])
head(tData)
##        kMV     alpha     lrrA     lrrV        y
## 1 151891.4 0.8349728 38.71343 29.62342 12.70690
## 2 292699.7 0.8340668 31.45554 40.85423 16.30124
## 3 218213.8 0.8719307 21.63183 45.98160 18.45270
## 4 122811.4 0.8310381 29.49230 35.13491 12.78105
## 5 223381.2 0.8681403 36.56351 24.98882 17.55068
## 6 169484.9 0.8741946 41.77370 31.13585 13.78499
ggplot(tData, aes(kMV, y)) + geom_point()

The inputs here are on very different scales. We should standardise these prior to modelling. Here, we have a fixed range for each input, so we use the min/max for each to scale these to [-1,1]:

tData$kMV <- (tData$kMV - 9*10^4) / ((3*10^5 - 9*10^4)/2) - 1
tData$alpha <- (tData$alpha - 0.83) / ((0.89 - 0.83)/2) - 1
tData$lrrA <- (tData$lrrA - 20) / ((50 - 20)/2) - 1
tData$lrrV <- (tData$lrrV - 20) / ((50 - 20)/2) - 1
summary(tData)
##       kMV                 alpha                 lrrA           
##  Min.   :-0.9950701   Min.   :-0.9808387   Min.   :-0.9966622  
##  1st Qu.:-0.5008905   1st Qu.:-0.5010414   1st Qu.:-0.4937711  
##  Median :-0.0036676   Median :-0.0004437   Median : 0.0001145  
##  Mean   : 0.0005379   Mean   : 0.0000804   Mean   :-0.0003623  
##  3rd Qu.: 0.5038140   3rd Qu.: 0.4911227   3rd Qu.: 0.5017052  
##  Max.   : 0.9856529   Max.   : 0.9953333   Max.   : 0.9858235  
##       lrrV                  y         
##  Min.   :-0.9977574   Min.   : 9.342  
##  1st Qu.:-0.4922606   1st Qu.:13.197  
##  Median :-0.0033427   Median :15.172  
##  Mean   :-0.0007766   Mean   :15.381  
##  3rd Qu.: 0.4986679   3rd Qu.:17.170  
##  Max.   : 0.9948385   Max.   :25.910

We probably want to split into training/test sets at this point. We could do this manually, or we can handle this internally in the emulation code.

To use this code, there’s 1 more thing we need to do. From the notes in Gasp.R, the input data must have the structure [design, Noise, output]. This is because a noise term is used in the selection of a mean function:

tData <- data.frame(tData[,1:4],
                    Noise = runif(nrow(tData), -1, 1),
                    y = tData$y)

Now we can run the code (setting a seed for reproducibility, because this function will randomly split the data into training/validation):

set.seed(2101)
em1 <- BuildGasp('y', tData)
## The upper bounds of the range parameters are 117.4767 120.7062 121.1018 122.6679 Inf 
## The initial values of range parameters are 2.349534 2.414125 2.422036 2.453357 
## Start of the optimization  1  : 
## The number of iterations is  19 
##  The value of the  marginal posterior  function is  -108.855 
##  Optimized range parameters are 1.637116 2.617475 3.509501 3.322867 
##  Optimized nugget parameter is 0.0005770015 
##  Convergence:  TRUE 
## The initial values of range parameters are 1.762785 1.811246 1.817181 1.840681 
## Start of the optimization  2  : 
## The number of iterations is  22 
##  The value of the  marginal posterior  function is  -108.855 
##  Optimized range parameters are 1.637116 2.617475 3.509501 3.322867 
##  Optimized nugget parameter is 0.0005770015 
##  Convergence:  TRUE
summary(em1)
##                 Length Class      Mode     
## em              1      rgasp      S4       
## type            1      -none-     character
## mean_fn         0      -none-     NULL     
## train_data      5      data.frame list     
## validation_data 5      data.frame list

This has stored an rgasp emulator under $em, what mean function was used (here none, hence NULL), and also what the training data and validation data were. If you look at Gasp.R or the raw code for BuildGasp, you’ll see there’s an input training_prop, which will split the data into training and validation sets, with a default of 75% used for training.

We can validate in several ways:

par(mar = c(4,2,2,2));ValidateGasp(em1)

This function has several options. If you provide it with only an emulator, it will predict over the validation data stored in the BuildGasp object. You can alternatively provide it with a new dataset. You can also get it to plot the predictions against the inputs:

par(mfrow = c(2,3), mar = c(4,2,2,2));ValidateGasp(em1, IndivPars = TRUE)

This emulator might not be suitable: these are 95% prediction intervals, and we have 5/25 points coloured red, i.e. 20% of our predictions are outside these intervals.

Alternatively, can do leave-one-out across the training data:

par(mar = c(4,2,2,2));LeaveOneOut(em1)

We’re doing a bit better here - have around 5% outside 95%.

We could try something more complicated in the mean function. Could just do linear:

set.seed(5820)
em2 <- BuildGasp('y', tData, mean_fn = 'linear')
## The upper bounds of the range parameters are 104.4217 103.118 103.1081 105.4503 Inf 
## The initial values of range parameters are 2.088434 2.062359 2.062162 2.109005 
## Start of the optimization  1  : 
## The number of iterations is  22 
##  The value of the  marginal posterior  function is  -92.54723 
##  Optimized range parameters are 1.438671 1.37506 1.529972 1.708329 
##  Optimized nugget parameter is 0.0007741812 
##  Convergence:  TRUE 
## The initial values of range parameters are 1.822727 1.79997 1.799798 1.840681 
## Start of the optimization  2  : 
## The number of iterations is  21 
##  The value of the  marginal posterior  function is  -92.54723 
##  Optimized range parameters are 1.438671 1.37506 1.529972 1.708329 
##  Optimized nugget parameter is 0.0007741812 
##  Convergence:  TRUE
par(mfrow=c(1,2),mar = c(4,2,2,2));ValidateGasp(em2);LeaveOneOut(em2)

Or could fit something more general:

set.seed(3100329)
em3 <- BuildGasp('y', tData, mean_fn = 'step')
par(mfrow=c(1,2),mar = c(4,2,2,2));ValidateGasp(em3);LeaveOneOut(em3)

This final emulator appears to validate better than the others. Here, we’ve fitted a more complicated mean surface initially - we have some new entries in our emulator object:

summary(em3)
##                 Length Class      Mode     
## em               1     rgasp      S4       
## lm              11     -none-     list     
## active           4     -none-     character
## type             1     -none-     character
## mean_fn          1     -none-     character
## train_data       5     data.frame list     
## validation_data  5     data.frame list

$lm is a fitted linear regression object, giving this more complex mean structure:

summary(em3$lm$linModel)
## 
## Call:
## lm(formula = y ~ lrrA + I(lrrA^2) + kMV + lrrV + alpha + I(kMV * 
##     lrrA) + I(lrrV * lrrA) + I(lrrV * kMV), data = tData)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.32718 -0.47132  0.08911  0.41029  1.72194 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     15.1490     0.1118 135.527  < 2e-16 ***
## lrrA            -3.4397     0.1323 -26.008  < 2e-16 ***
## I(lrrA^2)        0.8757     0.2647   3.309 0.001521 ** 
## kMV              2.8495     0.1253  22.744  < 2e-16 ***
## lrrV            -2.0387     0.1328 -15.348  < 2e-16 ***
## alpha            1.9137     0.1305  14.661  < 2e-16 ***
## I(kMV * lrrA)   -1.2418     0.2241  -5.541 5.65e-07 ***
## I(lrrV * lrrA)   0.9021     0.2256   3.999 0.000163 ***
## I(lrrV * kMV)   -0.6761     0.2214  -3.054 0.003252 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6281 on 66 degrees of freedom
## Multiple R-squared:  0.9581, Adjusted R-squared:  0.953 
## F-statistic: 188.7 on 8 and 66 DF,  p-value: < 2.2e-16

$active is a list of which of the input variables are treated as active when fitting the GP to the residuals (essentially, in the mean fitting process we may find that some inputs are just noise, hence we don’t include these when fitting the covariance):

em3$active
## [1] "lrrA"  "kMV"   "lrrV"  "alpha"

Here, all 4 of the inputs are considered active, but in general, as the dimensionality of the input space increases, it’s more common for this to be a subset of the inputs.

By default, this mean function is not allowed more terms than 0.1*number of training points. However, this won’t always work, and option maxdf allows flexibility in this specification (e.g., we may find we overfitted the data and want to limit the number of terms being added, or we may find that 10% is too restrictive and we need to add more).

Emulate full output

The above emulated a single output, but this code could be used to emulate \(\ell\) outputs by looping over different outputs.

Alternatively, we could use dimension reduction to simplify this task. Let’s emulate Flow1. First, we want to construct a basis. This is easy to do (this function has a lot more flexibility than is usually needed, e.g., can do weighted SVD for general matrices):

DataBasis <- MakeDataBasis(all_data[,1,])
summary(DataBasis)
##              Length Class  Mode   
## tBasis       51200  -none- numeric
## CentredField 51200  -none- numeric
## EnsembleMean   512  -none- numeric
## Q                0  -none- NULL   
## Lambda           0  -none- NULL
dim(DataBasis$tBasis)
## [1] 512 100
dim(DataBasis$CentredField)
## [1] 512 100

By default, this function subtracts the ensemble mean, and then calculates the SVD across the (centred) data. $Q and $Lambda are only stored if this function is used for weighted SVD, hence are not returned here (and indeed, not required in future calculations).

Plotting the leading few vectors:

q <- 9
plot_basis <- data.frame(Time = rep(1:t, q),
                         Vector = rep(1:q, each = t),
                         Weight = c(DataBasis$tBasis[,1:q]))
ggplot(plot_basis, aes(Time, Weight)) +
  geom_line() +
  facet_wrap(vars(Vector))

Generally we truncate this basis for some small \(q\) (such that a large enough amount of variability has been explained and/or such that any later vectors are just noise, with no signal from the parameters).

q <- ExplainT(DataBasis, vtot = 0.95)
q
## [1] 4

Here, 4 vectors explain 95% of the variability in the data. In practice, we might want to try emulating the 5th, 6th etc. to see if these are predictable, but let’s just use the 1st 4 for now.

Alternatively, plot cumulative variance explained:

vars <- lapply(1:n, function(k) VarExplained(DataBasis$tBasis[,1:k], DataBasis$CentredField))
ggplot(data.frame(q = 1:n, Proportion = unlist(vars)), aes(x = q, y = Proportion)) +
  geom_line() +
  ylim(0,1)

Projecting the data onto the basis:

Coeffs <- Project(data = DataBasis$CentredField, 
                  basis = DataBasis$tBasis[,1:q])
colnames(Coeffs)[1:q] <- paste("C",1:q,sep="")
summary(Coeffs)
##        C1                C2                C3                C4         
##  Min.   :-63.952   Min.   :-42.022   Min.   :-25.063   Min.   :-27.300  
##  1st Qu.:-28.476   1st Qu.:-20.689   1st Qu.:-10.353   1st Qu.: -5.812  
##  Median : -5.733   Median :  0.856   Median : -4.299   Median : -1.565  
##  Mean   :  0.000   Mean   :  0.000   Mean   :  0.000   Mean   :  0.000  
##  3rd Qu.: 22.744   3rd Qu.: 19.789   3rd Qu.:  5.128   3rd Qu.:  5.334  
##  Max.   :162.433   Max.   : 50.687   Max.   : 81.680   Max.   : 27.256
tDataC <- data.frame(tData[,1:4], # getting the scaled version from before
                     Noise = runif(n, -1, 1), 
                     Coeffs)
head(tDataC)
##          kMV      alpha       lrrA         lrrV       Noise         C1
## 1 -0.4105579 -0.8342397  0.2475622 -0.358438412 -0.94629091 -57.654972
## 2  0.9304734 -0.8644406 -0.2362974  0.390282117  0.63345668 -20.741136
## 3  0.2210836  0.3976897 -0.8912116  0.732106785  0.75267760  47.747129
## 4 -0.6875105 -0.9653981 -0.3671799  0.008994258 -0.20308374 -51.121585
## 5  0.2702968  0.2713426  0.1042343 -0.667411738  0.38438111  21.977666
## 6 -0.2430006  0.4731540  0.4515797 -0.257609746  0.02633979  -5.173414
##           C2         C3         C4
## 1 -21.804896   9.724607   1.470627
## 2 -32.018533  -9.987071  10.134155
## 3   1.772807 -19.527830  -6.458404
## 4  -7.456377  15.738800 -13.113496
## 5 -18.626407  -6.811848  -6.335217
## 6  21.054044  -2.062772  -4.182515

Now we can build emulators for these \(q\) coefficients. We can do this simultaneously for all \(q\) (as long as we’re happy using the same assumptions for each). To make things consistent across each emulator, I’m going to define training/validation sets by hand, and then fit to the full training set by setting training_prop = 1 (rather than allowing the function to do this split for me):

set.seed(321)
inds <- sample(1:n, n)
train_inds <- inds[1:75]
val_inds <- inds[-c(1:75)]
train_data <- tDataC[train_inds,]
val_data <- tDataC[val_inds,]
em_coeffs <- BasisEmulators(tDataC, q, mean_fn = 'step', maxdf = 5, training_prop = 1)

The output here is now a list of \(q\) emulator objects, e.g.

summary(em_coeffs[[1]])
##                 Length Class      Mode     
## em               1     rgasp      S4       
## lm              11     -none-     list     
## active           4     -none-     character
## type             1     -none-     character
## mean_fn          1     -none-     character
## train_data       5     data.frame list     
## validation_data  0     -none-     NULL
summary(em_coeffs[[2]])
##                 Length Class      Mode     
## em               1     rgasp      S4       
## lm              11     -none-     list     
## active           4     -none-     character
## type             1     -none-     character
## mean_fn          1     -none-     character
## train_data       5     data.frame list     
## validation_data  0     -none-     NULL

Now in ValidateGasp, we need to provide the validation set (it’s no longer internal to the BasisEmulator object):

par(mfrow=c(2,2), mar=c(4,2,2,2))
ValidateGasp(em_coeffs[[1]], val_data)
ValidateGasp(em_coeffs[[2]], val_data)
ValidateGasp(em_coeffs[[3]], val_data)
ValidateGasp(em_coeffs[[4]], val_data)

par(mfrow=c(2,2), mar=c(4,2,2,2))
LeaveOneOut(em_coeffs[[1]]);LeaveOneOut(em_coeffs[[2]]);LeaveOneOut(em_coeffs[[3]]);LeaveOneOut(em_coeffs[[4]])

Prediction

The validation plots above are performing prediction across the test data. In general, we can predict for any sets of inputs, for either a 1D emulator, or for a set of basis emulators. Doing so across a space-filling design in parameter space (here, we are still working in [-1,1]):

ns <- 1000 # usually want more, but set low for speed
BigDesign <- 2*as.data.frame(randomLHS(ns, 4)) - 1 
colnames(BigDesign) <- colnames(design)[1:4]

Preds_1D <- PredictGasp(BigDesign, em3)
Preds_basis <- BasisPredGasp(BigDesign, em_coeffs)

These store slightly different things - in the 1D case, we get mean/sd/lower95/upper95 (the same as what predict.rgasp returns):

summary(Preds_1D)
##         Length Class  Mode   
## mean    1000   -none- numeric
## lower95 1000   -none- numeric
## upper95 1000   -none- numeric
## sd      1000   -none- numeric

In the basis case, I only store $Expectation and $Variance (as this is all we need for history matching, and because prediction intervals can be derived from these, so don’t want to store these if we have a lot of emulators):

summary(Preds_basis)
##             Length Class  Mode   
## Expectation 4000   -none- numeric
## Variance    4000   -none- numeric

Both of these objects are \(ns \times q\) dimensional, where each row corresponds to an input we’re predicting at, and each column corresponds to a basis vector:

dim(Preds_basis$Expectation)
## [1] 1000    4
dim(Preds_basis$Variance)
## [1] 1000    4

Reconstructing

From basis coefficients (whether given by projection, or predicted by an emulator), we can reconstruct a prediction of the original field.

Let’s consider the runs from the validation set. First produce predictions for them:

Preds_val <- BasisPredGasp(val_data, em_coeffs)

Let’s just take the first run, and compare the mean/variance to the truth (in coefficient/projection space):

data.frame(Truth = as.numeric(val_data[1,-c(1:5)]),
           Mean = Preds_val$Expectation[1,],
           Var = Preds_val$Variance[1,])
##       Truth       Mean       Var
## 1 -22.05506 -20.082626 1.2726158
## 2  31.87312  32.698060 2.5789602
## 3  12.18553  12.623414 0.9006701
## 4   3.29881   3.858779 2.6456150

Plot the true run vs emulator reconstruction of it (still with the ensemble mean removed, as this will dominate):

plot_data <- data.frame(Time = 1:t,
                        Truth = DataBasis$CentredField[,val_inds[1]],
                        Recon = Recon(Preds_val$Expectation[1,], DataBasis$tBasis[,1:q]))
plot_data2 <- melt(plot_data, id.vars = c('Time'))
ggplot(plot_data2, aes(Time, value, col = variable)) + geom_line()

Looks quite accurate (remember this run was not used for fitting the emulator).

Mean prediction by itself is not that informative, also sample from the emulators:

em_samp <- matrix(0, t, 100)
for (s in 1:100){
  samp <- rnorm(q, 
                mean = Preds_val$Expectation[1,],
                sd = sqrt(Preds_val$Variance[1,]))
  rec <- Recon(samp, DataBasis$tBasis[,1:q])
  em_samp[,s] <- rec
}
ggplot(plot_data2, aes(Time, value, col = variable)) + 
  geom_line(data = data.frame(Time = 1:t, value = c(em_samp), s = rep(1:100, each = t)), aes(Time, value, linetype = as.factor(s)), col = 'grey', alpha = 0.6) +
  geom_line(size = 1.25) +
  scale_linetype_manual(values = rep(1,100), guide = 'none')

Really, we should also add the uncertainty due to the discarded basis vectors - by truncating the basis after \(q\) vectors, we know that we’re not perfectly reconstructing the true simulations via the emulators, and need to acknowledge this additional variability. It is likely small relative to the variability we’ve explained (less than 5%) but we should still account for it.

There’s a function that will calculate this variance matrix, and we can sample from it:

extra_var <- DiscardedBasisVariance(DataBasis, q)
extra_var_samples <- rmvnorm(100, rep(0, t), extra_var)
dim(extra_var_samples)
## [1] 100 512
plot_samples <- data.frame(Time = 1:512,
                           epsilon = c(t(extra_var_samples)),
                           s = rep(1:100, each = t))

ggplot(plot_samples, aes(Time, epsilon, col = as.factor(s))) + 
  geom_line(alpha = 0.6) +
  theme(legend.position = 'none')

The above is 100 samples from the discarded vectors, and we have clear correlated structure in these (some of the later basis vectors are likely quite noisy, however these will have lower variance/eigenvalues, so are drowned out by the earlier, more structured, discarded vectors).

Adding this variability onto our emulator samples:

ggplot(plot_data2, aes(Time, value, col = variable)) + 
  geom_line(data = data.frame(Time = 1:t, value = c(em_samp + t(extra_var_samples)), s = rep(1:100, each = t)), aes(Time, value, linetype = as.factor(s)), col = 'grey', alpha = 0.6) +
  geom_line(size = 1.25) +
  scale_linetype_manual(values = rep(1,100), guide = 'none')

Before, we were missing the data in places, because the truncated basis did not have the ability to perfectly reconstruct the truth. Now, the truth lies within our uncertainty.

Just plotting 95% prediction intervals for clarity:

plot_data <- data.frame(Time = 1:t,
                        Truth = DataBasis$CentredField[,val_inds[1]],
                        Recon = rep(Recon(Preds_val$Expectation[1,], DataBasis$tBasis[,1:q]), 2),
                        Lower = c(apply(em_samp, 1, quantile, probs = 0.025), apply(em_samp + t(extra_var_samples), 1, quantile, probs = 0.025)),
                        Upper = c(apply(em_samp, 1, quantile, probs = 0.975), apply(em_samp + t(extra_var_samples), 1, quantile, probs = 0.975)),
                        Type = rep(c('EmVar', 'FullVar'), each = t))
plot_data2 <- melt(plot_data, id.vars = c('Time', 'Type'))
pal <- scales::hue_pal()(2)
ggplot(plot_data2, aes(Time, value, col = variable, linetype = variable)) +
  geom_line(size = 0.8) +
  facet_wrap(vars(Type)) +
  scale_linetype_manual(values = c(1,1,2,2)) +
  scale_colour_manual(values = pal[c(1,2,2,2)]) +
  theme(legend.position = 'none')

Or doing for 4 different inputs (immediately adding in the variance from the discarded vectors):

em_samp <- array(0, dim = c(t, 100, 4))
plot_data <- NULL
for (i in 2:5){
  plot_data <- rbind(plot_data,
                     data.frame(Time = 1:t,
                                Truth = DataBasis$CentredField[,val_inds[i]],
                                Recon = Recon(Preds_val$Expectation[i,], DataBasis$tBasis[,1:q]),
                                Run = i))
  for (s in 1:100){
    samp <- rnorm(q, 
                  mean = Preds_val$Expectation[i,],
                  sd = sqrt(Preds_val$Variance[i,]))
    rec <- Recon(samp, DataBasis$tBasis[,1:q])
    em_samp[,s,i-1] <- rec
  }
}

plot_data2 <- melt(plot_data, id.vars = c('Time', 'Run'))

extra_var_samples <- t(rmvnorm(400, rep(0, t), extra_var))
dim(extra_var_samples) <- c(t, 100, 4)

plot_data_samp <- data.frame(Time = 1:t, 
                             value = c(em_samp + extra_var_samples), 
                             s = rep(1:100, each = t),
                             Run = rep(2:5, each = t*100))

ggplot(plot_data2, aes(Time, value, col = variable)) +
  facet_wrap(vars(Run)) +
  geom_line(data = plot_data_samp, aes(Time, value, linetype = as.factor(s)), col = 'grey', alpha = 0.6) +
  geom_line(size = 0.8) +
  scale_linetype_manual(values = rep(1,100), guide = 'none')

Visualising response surface

We may want to see what’s happening across the output space is we systematically change values of the inputs. Here, the 2 dominant parameters are kMV and alpha, so let’s vary these.

Before we predicted over a large Latin hypercube. Now let’s systematically vary these 2 parameters, and fix the others at the centre of their range (zero):

ns <- 1000
pred_seq <- seq(from = -1, to = 1, by = 0.04)
pred_grid <- expand.grid(kMV = pred_seq, alpha = pred_seq)
NewDesign <- data.frame(pred_grid, lrrA = 0, lrrV = 0)
PredsNewBasis <- BasisPredGasp(NewDesign, em_coeffs)

Plotting the expectation of C1:

ggplot(data.frame(NewDesign, C1 = PredsNewBasis$Expectation[,1]),
       aes(x = kMV, y = alpha, col = C1)) +
  geom_point(size = 3, shape = 15) +
  scale_colour_viridis(limits = c(-85,85))

Or C2:

ggplot(data.frame(NewDesign, C2 = PredsNewBasis$Expectation[,2]),
       aes(x = kMV, y = alpha, col = C2)) +
  geom_point(size = 3, shape = 15) +
  scale_colour_viridis()

This is a 2D surface in the 4D input space with lrrA and lrrV both fixed to zero. Instead setting them both to 1 demonstrates different behaviour in the output:

ns <- 1000
pred_seq <- seq(from = -1, to = 1, by = 0.04)
pred_grid <- expand.grid(kMV = pred_seq, alpha = pred_seq)
NewDesign <- data.frame(pred_grid, lrrA = 1, lrrV = 1)
PredsNewBasis <- BasisPredGasp(NewDesign, em_coeffs)
ggplot(data.frame(NewDesign, C1 = PredsNewBasis$Expectation[,1]),
       aes(x = kMV, y = alpha, col = C1)) +
  geom_point(size = 3, shape = 15) +
  scale_colour_viridis(limits = c(-85,85))

Depending on what we’re interested in (relationship on this 2D surface for specific values of other parameters, vs general behaviour integrated across all other uncertain inputs), instead of fixing the other inputs we could sample. For example, for each specific pair of \((kMV, alpha)\) values (each point on the plot), we could sample a Latin hypercube across the remaining parameters, average the emulator prediction across these points, and plot this instead.